State space time series clustering using discrepancies based on the Kullback-Leibler information and the Mahalanobis distance
نویسندگان
چکیده
In this thesis, we consider the clustering of time series data; specifically, time series that can be modeled in the state space framework. Of primary focus is the pairwise discrepancy between two state space time series. The state space model can be formulated in terms of two equations: the state equation, based on a latent process, and the observation equation. Because the unobserved state process is often of interest, we develop discrepancy measures based on the estimated version of the state process. We compare these measures to discrepancies based on the observed data. In all, seven novel discrepancies are formulated. First, discrepancies derived from Kullback-Leibler (KL) information and Mahalanobis distance (MD) measures are proposed based on the observed data. Next, KL information and MD discrepancies are formulated based on the composite marginal contributions of the smoothed estimates of the unobserved state process. Furthermore, an MD is created based on the joint contributions of the collection of smoothed estimates of the unobserved state process. The cross trajectory distance, a discrepancy heavily influenced by both observed and smoothed data, is proposed as well as a Euclidean distance based on the smoothed state estimates. The performance of these seven novel discrepancies is compared to the often used Euclidean distance based on the observed data, as well as a KL information discrepancy based on the joint contributions of the collection of smoothed state estimates (Bengtsson and Cavanaugh, 2008). We find that those discrepancy measures based on the smoothed estimates of the unobserved state process outperform those discrepancy measures based on the observed data. The best performance was achieved by the discrepancies founded upon the joint contributions of the collection of unobserved states, followed by the discrepancies derived from the marginal contributions.
منابع مشابه
Discriminant Analysis for ARMA Models Based on Divergency Criterion: A Frequency Domain Approach
The extension of classical analysis to time series data is the basic problem faced in many fields, such as engineering, economic and medicine. The main objective of discriminant time series analysis is to examine how far it is possible to distinguish between various groups. There are two situations to be considered in the linear time series models. Firstly when the main discriminatory informati...
متن کاملUsing Kullback-Leibler distance for performance evaluation of search designs
This paper considers the search problem, introduced by Srivastava cite{Sr}. This is a model discrimination problem. In the context of search linear models, discrimination ability of search designs has been studied by several researchers. Some criteria have been developed to measure this capability, however, they are restricted in a sense of being able to work for searching only one possibl...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملModel Confidence Set Based on Kullback-Leibler Divergence Distance
Consider the problem of estimating true density, h(.) based upon a random sample X1,…, Xn. In general, h(.)is approximated using an appropriate in some sense, see below) model fƟ(x). This article using Vuong's (1989) test along with a collection of k(> 2) non-nested models constructs a set of appropriate models, say model confidence set, for unknown model h(.).Application of such confide...
متن کاملOn Low Distortion Embeddings of Statistical Distance Measures into Low Dimensional Spaces
Statistical distance measures have found wide applicability in information retrieval tasks that typically involve high dimensional datasets. In order to reduce the storage space and ensure efficient performance of queries, dimensionality reduction while preserving the inter-point similarity is highly desirable. In this paper, we investigate various statistical distance measures from the point o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016